Mirror of linear changes by kwyss-nvidia · Pull Request #2 · kwyss-nvidia/TransformerEngine

kwyss-nvidia · 2025-03-12T00:39:17Z

A more reviewable mirror of the changes from NVIDIA#1559

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

* Use dummy wgrads for lower memory consumption Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: Vasudevan Rengasamy <vrengasamy@nvidia.com> * Bug fix to avoid sharing gradients. Signed-off-by: Vasudevan Rengasamy <vrengasamy@nvidia.com> * Disable automatic use of batch_p2p_comm for CP2 Signed-off-by: Vasudevan Rengasamy <vrengasamy@nvidia.com> * Change weight to origin_weight for LN_LINEAR Signed-off-by: Vasudevan Rengasamy <vrengasamy@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Vasudevan Rengasamy <vrengasamy@nvidia.com> --------- Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: Vasudevan Rengasamy <vrengasamy@nvidia.com> Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>

Signed-off-by: zhongboz <zhongboz@nvidia.com>

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

@ptrendx

* Minor stylistic tweaks and typo fixes Review suggestions from @ptrendx Signed-off-by: Tim Moon <tmoon@nvidia.com> * Fix incorrect col strides for MXFP8 matrices Signed-off-by: Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

Apply MR comment change. Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by: kwyss-nvidia <kwyss@nvidia.com>

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

* scaling enum abstract * rm NVTE_ from ScalingMode names * rework scaling mode enum in grouped gemm * fix norm sharding --------- Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

…r op backward (NVIDIA#1646) Explicitly specify quantized tensor usages needed for linear op backward Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Debug checkpointing with te.Sequential Signed-off-by: Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

Signed-off-by: Xin Yao <yaox12@outlook.com>

kwyss-nvidia mentioned this pull request Mar 12, 2025

Blockwise scaling linear quantization recipe NVIDIA/TransformerEngine#1559

Merged

13 tasks

kwyss-nvidia force-pushed the kwyss/subchannel_recipe_linear branch 2 times, most recently from 07d55ea to 8bb7d63 Compare March 12, 2025 21:39

kwyss-nvidia force-pushed the kwyss/cublas_gemm_github_mr branch from c3eebe7 to b848509 Compare March 12, 2025 21:42

kwyss-nvidia force-pushed the kwyss/subchannel_recipe_linear branch from 8bb7d63 to 365a4d9 Compare March 13, 2025 00:06

kwyss-nvidia force-pushed the kwyss/cublas_gemm_github_mr branch from b848509 to 1058efc Compare March 13, 2025 00:07

kwyss-nvidia force-pushed the kwyss/subchannel_recipe_linear branch 3 times, most recently from 6c70366 to 08aa4de Compare March 15, 2025 00:22

kwyss-nvidia force-pushed the kwyss/cublas_gemm_github_mr branch 2 times, most recently from eee37bf to ce4ca80 Compare March 17, 2025 17:24

kwyss-nvidia force-pushed the kwyss/subchannel_recipe_linear branch 2 times, most recently from 51fbe41 to 78c194d Compare March 17, 2025 17:33

kwyss-nvidia force-pushed the kwyss/cublas_gemm_github_mr branch from ce4ca80 to 5ebc93a Compare March 19, 2025 22:42

kwyss-nvidia force-pushed the kwyss/subchannel_recipe_linear branch from 78c194d to 8f4f0f0 Compare March 19, 2025 22:43

kwyss-nvidia force-pushed the kwyss/subchannel_recipe_linear branch 2 times, most recently from 1d112ac to 48648a9 Compare April 1, 2025 19:43

kwyss-nvidia force-pushed the kwyss/cublas_gemm_github_mr branch from 5aa279e to 8466c36 Compare April 1, 2025 19:45

kwyss-nvidia force-pushed the kwyss/subchannel_recipe_linear branch from ca005ab to e35f2b6 Compare April 1, 2025 21:46

kwyss-nvidia force-pushed the kwyss/cublas_gemm_github_mr branch from 8466c36 to e788ca2 Compare April 1, 2025 21:48

kwyss-nvidia force-pushed the kwyss/subchannel_recipe_linear branch 2 times, most recently from 22828fe to 413331d Compare April 1, 2025 23:23

kwyss-nvidia force-pushed the kwyss/cublas_gemm_github_mr branch from e788ca2 to 9ac89ea Compare April 1, 2025 23:23

kwyss-nvidia force-pushed the kwyss/subchannel_recipe_linear branch 4 times, most recently from db5b49e to 8d59b0a Compare April 2, 2025 18:52

kwyss-nvidia force-pushed the kwyss/cublas_gemm_github_mr branch from 9ac89ea to fa019d5 Compare April 2, 2025 18:53

kwyss-nvidia force-pushed the kwyss/subchannel_recipe_linear branch from 8d59b0a to 3424dc7 Compare April 2, 2025 19:19

kwyss-nvidia force-pushed the kwyss/cublas_gemm_github_mr branch from fa019d5 to cd3e414 Compare April 2, 2025 19:20

kwyss-nvidia and others added 14 commits April 7, 2025 16:50

Set usage before BF16 gather.

a21e65b

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

Merge remote-tracking branch 'origin/main' into HEAD

e6ad90e

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

refactor for nvte_quantize_v2

0e8d324

Signed-off-by: zhongboz <zhongboz@nvidia.com>

Format code.

e077601

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

Cleanup nvte_quantize_v2

07a70b8

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

Test fp32 scales.

64f2601

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

Disable CUDA graph.

3cb712c

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

Merge remote-tracking branch 'origin/main' into HEAD

6f84d2c

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

Simplify layernorm linear

07a563b

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

Cleanup layernorm linear.

9a3abe2

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

LayerNorm linear bwd gather logic.

27d9922

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

Communication updates.

b62d555

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

kwyss-nvidia force-pushed the kwyss/subchannel_recipe_linear branch from d7775fc to b62d555 Compare April 8, 2025 23:35

kwyss-nvidia and others added 2 commits April 8, 2025 16:39

Update transformer_engine/pytorch/ops/op.py

196cd6d

Apply MR comment change. Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by: kwyss-nvidia <kwyss@nvidia.com>

Lint fix.

67e790b

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

kwyss-nvidia force-pushed the kwyss/subchannel_recipe_linear branch from 8fc753d to 67e790b Compare April 9, 2025 00:05

MR feedback.

ea9e46b

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

kwyss-nvidia force-pushed the kwyss/subchannel_recipe_linear branch from 6948759 to ea9e46b Compare April 9, 2025 01:32

kwyss-nvidia and others added 10 commits April 8, 2025 18:49

Enable cuda graph tests.

324792b

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

Reduce chance of spurious failure and reword.

54e7279

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

[JAX] Scaling Enum Abstracting (NVIDIA#1655)

962d9c5

* scaling enum abstract * rm NVTE_ from ScalingMode names * rework scaling mode enum in grouped gemm * fix norm sharding --------- Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

[PyTorch] Explicitly specify quantized tensor usages needed for linea…

20e95ba

…r op backward (NVIDIA#1646) Explicitly specify quantized tensor usages needed for linear op backward Signed-off-by: Tim Moon <tmoon@nvidia.com>

Review suggestions from @timmoon10

0bf7844

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Merge branch 'main' into kwyss/subchannel_recipe_linear

62662ae

Update CPP tests.

7efac72

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

Update common.h

c3ee3d8

Signed-off-by: Xin Yao <yaox12@outlook.com>

Update test_float8blockwisetensor.py

59cb49c

Signed-off-by: Xin Yao <yaox12@outlook.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Mirror of linear changes#2

Mirror of linear changes#2
kwyss-nvidia wants to merge 56 commits intokwyss/cublas_gemm_github_mrfrom
kwyss/subchannel_recipe_linear

kwyss-nvidia commented Mar 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Comments

Conversation

kwyss-nvidia commented Mar 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants